Performance Characterization of Matrix Multiplication on SGI Altix 3700
نویسندگان
چکیده
Matrix multiplication is widely used in a variety of applications and is often one of the core components of many scientific computations which includes graph theory, numerical methods, digital control and signal processing. Multiplication of large matrices require a lot of computation time as its complexity is O(n), where n is the dimension of the matrix. A serial algorithm to compute large matrix multiplication could be time consuming. A typical way to reduce the time spent on multiplication would be to use a parallel program on a given parallel platform and try to overlap computation and communication in order to decrease running time and simultaneously increase efficiency. In this project, we have studied some of the parallel algorithms for solving matrix multiplication problems, for both, dense and sparse matrices. We have also implemented some of the matrix multiplication algorithms in C using Message Passing Interface (MPI) and studied their performance on SGI Altix 3700.
منابع مشابه
Optimizing OpenMP Parallelized DGEMM Calls on SGI Altix 3700
Using functions of parallelized mathematical libraries is a common way to accelerate numerical applications. Computer architectures with shared memory characteristics support different approaches for the implementation of such libraries, usually OpenMP or MPI. This paper’s content is based on the performance comparison of DGEMM calls (floating point matrix multiplication, double precision) with...
متن کاملHigh Performance FFT on SGI Altix 3700
We have developed a high-performance FFT on SGI Altix 3700, improving the efficiency of the floating-point operations required to compute FFT by using a kind of loop fusion technique. As a result, we achieved a performance of 4.94 Gflops at 1-D FFT of length 4096 with an Itanium 2 1.3 GHz (95% of peak), and a performance of 28 Gflops at 2-D FFT of 4096 with 32 processors. Our FFT kernel outperf...
متن کاملInterconnect Performance Evaluation of SGI Altix 3700 Cray X1, Cray Opteron, and Dell PowerEdge
We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare between these interconnects. We used several benchmarks to examine network behavior under different communication patterns and numbe...
متن کاملAnalyzing Mutual Influences of High Performance Computing Programs on SGI Altix 3700 and 4700 Systems with PARbench
c © 2007 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher ment...
متن کاملPerformance of OSCAR Multigrain Parallelizing Compiler on SMP Servers
This paper describes performance of OSCAR multigrain parallelizing compiler on various SMP servers, such as IBM pSeries 690, Sun Fire V880, Sun Ultra 80, NEC TX7/i6010 and SGI Altix 3700. The OSCAR compiler hierarchically exploits the coarse grain task parallelism among loops, subroutines and basic blocks and the near fine grain parallelism among statements inside a basic block in addition to t...
متن کامل